Sparse Learning for Stochastic Composite Optimization
نویسندگان
چکیده
In this paper, we focus on Stochastic Composite Optimization (SCO) for sparse learning that aims to learn a sparse solution. Although many SCO algorithms have been developed for sparse learning with an optimal convergence rate O(1/T ), they often fail to deliver sparse solutions at the end either because of the limited sparsity regularization during stochastic optimization or due to the limitation in online-to-batch conversion. To improve the sparsity of solutions obtained by SCO, we propose a simple but effective stochastic optimization scheme that adds a novel sparse online-to-batch conversion to the traditional SCO algorithms. The theoretical analysis shows that our scheme can find a solution with better sparse patterns without affecting the convergence rate. Experimental results on both synthetic and real-world data sets show that the proposed methods are more effective in recovering the sparse solution and have comparable convergence rate as the state-of-the-art SCO algorithms for sparse learning.
منابع مشابه
A Sparsity Preserving Stochastic Gradient Method for Composite Optimization
We propose new stochastic gradient algorithms for solving convex composite optimization problems. In each iteration, our algorithms utilize a stochastic oracle of the gradient of the smooth component in the objective function. Our algorithms are based on a stochastic version of the estimate sequence technique introduced by Nesterov (Introductory Lectures on Convex Optimization: A Basic Course, ...
متن کاملEstimation, Optimization, and Parallelism when Data is Sparse
We study stochastic optimization problems when the data is sparse, which is in a sense dual to current perspectives on high-dimensional statistical learning and optimization. We highlight both the difficulties—in terms of increased sample complexity that sparse data necessitates—and the potential benefits, in terms of allowing parallelism and asynchrony in the design of algorithms. Concretely, ...
متن کاملNonconvex Sparse Learning via Stochastic Optimization with Progressive Variance Reduction
We propose a stochastic variance reduced optimization algorithm for solving sparse learning problems with cardinality constraints. Sufficient conditions are provided, under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. We further extend the proposed algorithm to an asynchronous parallel variant with a near linear spe...
متن کاملBreaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
Due to their simplicity and excellent performance, parallel asynchronous variants of stochastic gradient descent have become popular methods to solve a wide range of large-scale optimization problems on multi-core architectures. Yet, despite their practical success, support for nonsmooth objectives is still lacking, making them unsuitable for many problems of interest in machine learning, such ...
متن کاملStabilized Sparse Online Learning for Sparse Data
Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional sparse data, however, this method suffers from slow convergence and high variance due to heterogeneity in feature sparsity. To mitigate this issue, we introd...
متن کامل